Robust Discriminative Clustering with Sparse Regularizers
نویسندگان
چکیده
Clustering high-dimensional data often requires some form of dimensionality reduction, where clustered variables are separated from “noise-looking” variables. We cast this problem as finding a low-dimensional projection of the data which is well-clustered. This yields a one-dimensional projection in the simplest situation with two clusters, and extends naturally to a multi-label scenario for more than two clusters. In this paper, (a) we first show that this joint clustering and dimension reduction formulation is equivalent to previously proposed discriminative clustering frameworks, thus leading to convex relaxations of the problem; (b) we propose a novel sparse extension, which is still cast as a convex relaxation and allows estimation in higher dimensions; (c) we propose a natural extension for the multi-label scenario; (d) we provide a new theoretical analysis of the performance of these formulations with a simple probabilistic model, leading to scalings over the form d = O( √ n) for the affine invariant case and d = O(n) for the sparse case, where n is the number of examples and d the ambient dimension; and finally, (e) we propose an efficient iterative algorithm with running-time complexity proportional to O(nd), improving on earlier algorithms for discriminative clustering with the square loss, which had quadratic complexity in the number of examples.
منابع مشابه
Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
We describe a new library named picasso, which implements a unified framework of pathwise coordinate optimization for a variety of sparse learning problems (e.g., sparse linear regression, sparse logistic regression, sparse Poisson regression and sparse square root loss linear regression), combined with efficient active set selection strategies. Besides, the library allows users to choose diffe...
متن کاملA Joint Optimization Framework of Sparse Coding and Discriminative Clustering
Many clustering methods highly depend on extracted features. In this paper, we propose a joint optimization framework in terms of both feature extraction and discriminative clustering. We utilize graph regularized sparse codes as the features, and formulate sparse coding as the constraint for clustering. Two cost functions are developed based on entropy-minimization and maximum-margin clusterin...
متن کاملRobust Estimation in Linear Regression with Molticollinearity and Sparse Models
One of the factors affecting the statistical analysis of the data is the presence of outliers. The methods which are not affected by the outliers are called robust methods. Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers. Besides outliers, the linear dependency of regressor variables, which is called multicollinearity...
متن کاملHigh-dimensional Inference via Lipschitz Sparsity-Yielding Regularizers
Non-convex regularizers are more and more applied to high-dimensional inference with sparsity prior knowledge. In general, the nonconvex regularizer is superior to the convex ones in inference but it suffers the difficulties brought by local optimums and massive computation. A ”good” regularizer should perform well in both inference and optimization. In this paper, we prove that some non-convex...
متن کاملDiscriminative Transformation Learning for Fuzzy Sparse Subspace Clustering
This paper develops a novel iterative framework for subspace clustering (SC) in a learned discriminative feature domain. This framework consists of two modules of fuzzy sparse SC and discriminative transformation learning. In the first module, fuzzy latent labels containing discriminative information and latent representations capturing the subspace structure will be simultaneously evaluated in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 18 شماره
صفحات -
تاریخ انتشار 2017